استخراج ویژگی های مقاوم برای بازشناسی گفتار در محیط های نویزی مبتنی بر تبدیل فوریه کسری

1399/11/27 11:12:44

Speech is the main source of communication between human beings in order to show their ideas, feelings and thoughts to others. Speech Recognition Technology allows the computer to hear human voice commands and also interpret and react to human languages. Due to the noise existance in real environments, we face the noise challenge in real systems is which leads to noncompliance and unequal conditions in both test and train modes for real-world applications. Noise robustness is an extensive subject in ASR systems research that is several decades. In this thesis, first in order to investigate the robustness of formants of voiced frames of speech in different noisy environments, these formants have been extracted by Linear Prediction Coefficients (LPC) method. Then by defining a parameter named Mean Square Movement (MSM), the amount of movement or in other words the robustness of formants of clean voiced frames to noisy voiced frames with different noise sources was measured and it was shown that white noise in All signal-to-noise levels have the highest MSM value, or in other words, the greatest impact on the voiced formats of the speech signal. After that, an algorithm was proposed to extract the robust feature for speech recognition. This proposed structure is based on fractional Fourier transform and root function.This is why the proposed feature extraction algorithm is called Fractional Root Coefficients (FrRC).For theorical justification of the proposed method, a mathematical relationship was obtained between the FrRC features of clean speech, noise and noisy speech, and this relationship was compared with the mathematical relationship of the MFCC feature extraction method in different cases.The results of implementation of the speech recognition system based on the FrRC feature extraction method indicate an increase in recognition accuracy compared to other feature extraction methods. An increase of 24.6% and 25.3% of the recognition accuracy compared to LPC and MFCC methods in noisy environment with Babble noise and with signal to noise level of 10-dB, respectively, is proof of this claim. In order to further improve in speech recognition accuracy in noisy environments, another algorithm based on Fourier transform and the Power Normalized Cepstral Coefficient method, called Adaptive Fractional Power Normalized Cepstral Coefficient (AFPNCC), was introduced, analyzed and then implemented. In the proposed AFPNCC algorithm, based on the type and intensity of noise, the alpha coefficient of fractional Fourier transform in the algorithm is extracted by the differential evolution optimizer located in the body of the proposed algorithm structure. The results of the implementation of this algorithm show the improvement of speech recognition accuracy in both noisy and clean environments. Numerical results obtained from the simulation of speech recognition system based on AFPNCC feature extraction algorithm also show a 16 and 92% increase in recognition accuracy compared to PNCC and MFCC algorithms in noisy environment with Pink noise and signal to noise level of 5dB, respectively.